Setting an Optimal α That Minimizes Errors in Null Hypothesis Significance Tests
نویسندگان
چکیده
Null hypothesis significance testing has been under attack in recent years, partly owing to the arbitrary nature of setting α (the decision-making threshold and probability of Type I error) at a constant value, usually 0.05. If the goal of null hypothesis testing is to present conclusions in which we have the highest possible confidence, then the only logical decision-making threshold is the value that minimizes the probability (or occasionally, cost) of making errors. Setting α to minimize the combination of Type I and Type II error at a critical effect size can easily be accomplished for traditional statistical tests by calculating the α associated with the minimum average of α and β at the critical effect size. This technique also has the flexibility to incorporate prior probabilities of null and alternate hypotheses and/or relative costs of Type I and Type II errors, if known. Using an optimal α results in stronger scientific inferences because it estimates and minimizes both Type I errors and relevant Type II errors for a test. It also results in greater transparency concerning assumptions about relevant effect size(s) and the relative costs of Type I and II errors. By contrast, the use of α = 0.05 results in arbitrary decisions about what effect sizes will likely be considered significant, if real, and results in arbitrary amounts of Type II error for meaningful potential effect sizes. We cannot identify a rationale for continuing to arbitrarily use α = 0.05 for null hypothesis significance tests in any field, when it is possible to determine an optimal α.
منابع مشابه
TreeFix: Statistically Informed Gene Tree Error Correction using Species Trees – Supplementary Material
In our discussion of hypothesis testing, we said that trees are statistically equivalent if p ≥ α. However, strictly speaking, failing to reject the null hypothesis does not imply that the null hypothesis is true. For example, it could be that enough variability exists in the sequence information to mask the differences in the statistical support of different topologies. We must therefore also ...
متن کاملFalse Discovery Rates
In hypothesis testing, statistical significance is typically based on calculations involving p-values and Type I error rates. A p-value calculated from a single statistical hypothesis test can be used to determine whether there is statistically significant evidence against the null hypothesis. The upper threshold applied to the p-value in making this determination (often 5% in the scientific li...
متن کاملDesign of the Fuzzy Rank Tests Package
denote the critical function of a randomized test having significance level α and point null hypothesis θ, that is, the randomized test rejects the null hypothesis θ = θ0 at level α when the observed data are x with probability φ(x, α, θ0). The requirement that φ(x, α, θ) be a probability restricts it to being between zero and one (inclusive). The requirement that the test have its nominal leve...
متن کاملThe Optimal Discovery Procedure: A New Approach to Simultaneous Significance Testing
Significance testing is one of the main objectives of statistics. The NeymanPearson lemma provides a simple rule for optimally testing a single hypothesis when the null and alternative distributions are known. This result has played a major role in the development of significance testing strategies that are used in practice. Most of the work extending single testing strategies to multiple tests...
متن کاملGuidelines for Multiple Testing in Impact Evaluations of Educational Interventions
A. INTRODUCTION Studies that examine the impacts of education interventions on key student, teacher, and school outcomes typically collect data on large samples and on many outcomes. In analyzing these data, researchers typically conduct multiple hypothesis tests to address key impact evaluation questions. Tests are conducted to assess intervention effects for multiple outcomes, for multiple su...
متن کامل